执行mapredece程序可以发现number of splits:2 [hadoop@hadoop data]$ hadoop jar hadoop_train-lzo-bzip2.jar com.kun.hadoop.mapreduce.driver.LogETLDriver /input/lzo/ /output/lzo/ 19/04/16 19:11:44 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 19/04/16 19:11:46 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 19/04/16 19:11:46 INFO input.FileInputFormat: Total input paths to process : 2 19/04/16 19:11:46 INFO mapreduce.JobSubmitter: number of splits:4
Job Counters Launched map tasks=4 Launched reduce tasks=1 Data-local map tasks=4 Total time spent by all maps in occupied slots (ms)=427208 Total time spent by all reduces in occupied slots (ms)=39870 Total time spent by all map tasks (ms)=427208 Total time spent by all reduce tasks (ms)=39870 Total vcore-seconds taken by all map tasks=427208 Total vcore-seconds taken by all reduce tasks=39870 Total megabyte-seconds taken by all map tasks=437460992 Total megabyte-seconds taken by all reduce tasks=40826880
进入/home/hadoop/data/clear/day=20190416对lzo文件生成index操作 [hadoop@hadoop data]$ hadoop jar ~/app/hadoop-lzo-master/target/hadoop-lzo-0.4.21-SNAPSHOT.jar com.hadoop.compression.lzo.LzoIndexer /home/hadoop/data/clear/day=20190416/part-r-00000.lzo 19/04/16 20:15:02 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library from the embedded binaries 19/04/16 20:15:02 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev f1deea9a313f4017dd5323cb8bbb3732c1aaccc5] 19/04/16 20:15:04 INFO lzo.LzoIndexer: [INDEX] LZO Indexing file /home/hadoop/data/clear/day=20190416/part-r-00000.lzo, size 0.02 GB... 19/04/16 20:15:04 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 19/04/1620:15:05 INFO lzo.LzoIndexer: Completed LZO Indexingin1.12seconds (14.48 MB/s). Indexsizeis1.78 KB.
[hadoop@hadoop data]$
刷数据 hive> altertable lzo_test addifnotexistspartition(day='20190416'); OK Time taken: 11.12 seconds hive> select * from lzo_test limit 1; OK yahu AE W 20190416111803 63.72.55.168 shabi.com - 746411 20190416 Time taken: 6.673 seconds, Fetched: 1 row(s) hive>
执行mapreduce [hadoop@hadoop data]$ hadoop fs -du -s -h /home/hadoop/data/clear/day=20190416/part-r-00000.lzo 16.2 M 16.2 M /home/hadoop/data/clear/day=20190416/part-r-00000.lzo [hadoop@hadoop data]$
可以看到split为2【因为我的block为10M】 hive> select count(1) from lzo_test; Query ID = hadoop_20190416194747_ab77e496-bd78-4aaa-8a32-6bf34e98a2d1 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average loadfor a reducer (inbytes): set hive.exec.reducers.bytes.per.reducer=<number> Inordertolimit the maximum numberof reducers: set hive.exec.reducers.max=<number> Inordertoset a constantnumberof reducers: set mapreduce.job.reduces=<number> Starting Job = job_1555410311484_0006, TrackingURL = http://hadoop:8088/proxy/application_1555410311484_0006/ Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1555410311484_0006 Hadoop job information for Stage-1: numberof mappers: 2; number of reducers: 1
OK 706000 Time taken: 69.802 seconds, Fetched: 1 row(s)